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Abstract 

In the present paper we propose to describe gene networks in biological systems 
using probabilistic algorithms. We describe gene duplication in the process of biological 
evolution using introduction of the replica procedure for probabilistic algorithms. We 
construct the examples of such a replica procedure for hidden Markov models. We 
introduce the family of hidden Markov models where the set of hidden states is a finite 
I additive group with a p-adic metric and build the replica procedure for this family of 

O^' markovian models. 



> • 1 Introduction 
m 

^ , 

00 , Different methods of physics, in particular, probabilistic methods found application in genet- 
ics. In the present paper we introduce for applications to evolution of genomes the analogue 
of the replica procedure, which was used in the statistical physics of disordered systems. 
The repUca method, cf. [I], [2], was applied to description of states of disordered systems, in 
particular, spin glasses. In the present paper we propose to use replicas for investigation of 
gene duplication. One of the examples of this approach is based on the application of p-adic 
^ I numbers and probabilistic models related to p-adic mathematical physics [3J. 

One of the approaches to investigation of gene networks (genetic regulatory networks, 
or networks of interacting genes) and related metabolic networks in molecular biology, and 
also to investigation of gene regulation (regulation of gene expression) is the modeling of the 
mentioned networks using the corresponding system of kinetic equations. This system of 
kinetic equations describes metabolic reactions and levels of gene expression. An alternative 
approach to gene networks describes this network as a computational model which performs 
computations according to some algorithm (for example, a Boolean network). 

In the present paper we propose to describe a gene network as a probabilistic algorithm. 
Let us recall that a probabilistic (or randomized) algorithm differs from the standard (de- 
terministic) algorithm as follows: probabilistic algorithm performs commands with some 
probability. Therefore a probabilistic algorithm depends on the set of parameters (proba- 
bilities). One can put in correspondence to a probabilistic algorithm a system of kinetic 
equations which describes the rate of operation of some commands of the algorithm. This 
allows to unify the kinetic and the algorithmic descriptions of a gene network. 
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In this approach the parameters of a probabihstic algorithm correspond to the levels of 
gene expression. For discussion of theory of algorithms see [1], for introduction to proba- 
bilistic algorithms cf. [5]. Probabilistic and quantum algorithms with applications to some 
problems of bioinformatics, in particular, to sequence alignment were discussed in |B] . For 
discussion of analysis of genomes cf. [7] and for review of gene networks cf. |H]. 

One of the motivations for application of probabilistic algorithms to gene networks is the 
evolvability of genomes. According to the theory of neutral evolution [9| the majority of 
mutations (changes of a genome in the process of biological evolution) does not influate the 
fitness of the corresponding organisms. From the point of view of a genome as an algorithm 
this is not natural — random transformations of a program will break this program, i.e. will 
transform an efficient algorithm to inefficient. Here efficient algorithm is an algorithm which 
is able to perform the needed computations, in apphcation to biology this will correspond 
to the genome of a biologically fit organism. 

For a probabilistic algorithm continuous transformations of the parameters of the algo- 
rithm are possible. In application to gene networks these transformations correspond to 
variation of levels of gene expression. These transformations could be achieved by substitu- 
tions in regulatory sequences of the genome. 

Transformations of a genome in the process of evolution are not restricted to variation 
of levels of gene expression. One of the important mechanisms of evolution is the gene 
duplication, cf. [TU]. Under the gene duplication some parts of the genome (for example, 
the whole genome) can be duplicated several times (i.e. the new genome will contain several 
copies of the part of the old genome). Sequences which are the duplicates of some sequence 
are called paralogous. After the duplication the different copies of a gene may work as 
the initial gene, may be switched off, and may evolve obtaining new functions (the process 
duplication-specialization). The horizontal gene transfer can be considered as a particular 
case of gene duplication in the union of genomes of the different organisms. 

In order to construct the model of biological evolution one has to describe the class of 
probabilistic algorithms which correspond to gene networks and the family of transformations 
which describe point mutations and gene duplication. These transformations, according to 
the theory of neutral evolution, should transform an efficient algorithm to the efficient algo- 
rithm with high probability. In the present paper we consider the model of gene duplication 
for hidden Markov models. 

We discuss the analogy between the phenomenon of gene duplication and the replica 
procedure which was applied in the statistical physics of complex systems (in particular, 
the replica method in the theory of spin glasses). The replica procedure transforms the 
Hamiltonian of a complex system to several copies (replicas) of this Hamiltonian. The 
analogous transformation is applied to the observables. The quenched state for the complex 
system in the framework of the replica approach is computed as a result of interaction of 
several replicas of this system, cf. [1], [2]. 

Analogously the gene duplication substitutes a part of a genome by several copies of this 
part. In the genetic program these parts of the initial genome will work parallelly. If we 
consider the genetic program as a probabilistic algorithm then the gene duplication looks like 
a replica procedure for this algorithm. We arrive to the following problem — to introduce a 
natural definition of the replica procedure for probabilistic algorithms. This procedure should 
transform an efficient algorithm to an efficient algorithms with nonzero probability. The 
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different replicas will correspond to paralogous sequences (genes or regulatory sequences). 

Let us note that for a general probabilistic algorithm there is no natural definition of 
a replica procedure (and, if this procedure exists, it should not be unique). We also do 
not claim that we build in this paper a realistic model of gene regulation for some existing 
gene network, our aim here is to describe some nontrivial examples of replica procedure for 
probabilistic algorithms. 

In the present paper we consider some examples of replica procedures for hidden Markov 
models, or HMM (some simple class of probabilistic algorithms). In particular, we build an 
example of a hidden Markov model where the set of hidden states is the additive finite group 
with the p-adic metric and introduce the replica procedure for this model. Let us note that 
the p-adic methods of description of the genetic code were developed in [HI [121 HSl [lH fT5] . 

2 Replica procedure for HMM 

In the present section we introduce the definition of the replica procedure for hidden Markov 
models. 

A hidden Markov model (or HMM) F[f{t)) is a random function of a Markov chain. Here 
/ is a Markov chain with the discrete time (i.e. the time t is a natural number) which takes 
values in the finite set X (the set of hidden states of the model), F : X — )■ F is a random 
map from the finite set X of hidden states to the finite set Y of output (or production, or 
emission) states of the model. The map F describes the family of emission probabilities of 
the HMM. 

Therefore a hidden Markov model is described by the maps 

/ F 

N — >X — >Y 

where the Markov chain / is defined by the set of transition probabilities Pxx' : x — )■ x', 
X, x' G X. 

The introduced in the present paper replica procedure for hidden Markov models is 
defined with the help of the replacement of the set X of hidden states of the model by the 
direct product X x R, where i? is a finite set (the set of replicas). 

We discuss the following biological interpretation. A hidden Markov model generates a 
biological sequence (for example, a DNA sequence). The set y is a set of possible elements of 
the mentioned biological sequence (for a DNA this will be the set of nucleotides {A, T, G, C}), 
the set X is the set which describes the different regimes of generation of sequences. The 
replica procedure is a model of gene duplication — the different replicas correspond to the 
different paralogous sequences (genes or regulatory sequences). 

Let us describe the hidden Markov model for the replica symmetric case. The replica 
symmetry in our approach corresponds to the gene duplication for the case of neutral evo- 
lution. Neutral evolution transforms a probabilistic algorithm to an equivalent probabilistic 
algorithm (i.e. the algorithm which generates the same results with the same probabilities). 
This kind of hidden Markov model will be described by the composition of maps 

7 F 
N — X R — >Y 
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where / is a Markov chain with the set of transition probabihties 

P{x,r);{x' y) = Pxx'iSrr' + c(l - 6rr')), C G [0, 1]. 

The transition between the different rephcas r, r' will have the probability which is propor- 
tional to the coefficient c. 

The map F for the replica symmetric case will be given by the formula 

F{x, r) = F{x), 

i.e. this random map will not depend on the replica index r (a copy of a gene in the set 
of paralogs). Therefore the replica symmetric hidden Markov model F{f{t)) is equivalent 
to the initial hidden Markov model F{f{t)). This means that F{f{t)) generates the same 
(statistically) sequences of elements of Y as F{f{t)). In particular the described replica 
procedure will map an efficient HMM to an efficient HMM. 

In general one can consider a replica procedure with broken replica symmetry. In this 
case the map F will depend on the replica index r. Models with broken replica symmetry 
will describe the specialization of genes after duplication. A hidden Markov model with the 
broken replica symmetry will not be equivalent to the initial hidden Markov model. 

3 The p-adic HMM 

In the present section we discuss hidden Markov models where the sets of hidden states 
are hierarchical (i.e. are described by some ultrametric spaces). The simplest example of a 
hierarchical Markov model has the form 

/ F 
N — >X — >Y 

where: 

1) The set Y of output (or production, or emission) states of the Markov model is a finite 
set which in the model under consideration is taken to be equal to the set of nucleotides. 
We consider the 2-adic parametrization of the set Y = {A, U,G,C} introduced in [11], [T2] . 
i.e. the parametrization of Y by the space (2-dimensional space over the field of residues 
modulo 2). In the 2-adic approach the nucleotides are parametrized by the pairs of and 1 
as follows 



A 


G 




00 


01 


U 


C 




10 


11 



2) The set X of hidden states of the Markov model is an ultrametric space. In the 
example under consideration X = Z/2^Z, > 0, i.e. the set X is an additive group of 
residues modulo 2^ with the naturally defined 2-adic metric. We consider the Haar measure 
on the group X, where the measure is normalized in such a way that the measure of the 
group X is equal to one. 

3) The map / is a Markov chain taking values in the set X of hidden states. The family 
of transition probabilities of this Markov chain describes the discrete 2-adic diffusion, i.e. 
has the form 

Pxy = q{\x -y\2), g(-) > 0, / qi\x\2)dx<l, (1) 
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(where the integral is taken with respect to the mentioned Haar measure). 

4) The random map F : X Y is constructed as follows. We put in correspondence 
to any ball J G X (including balls of zero diameter, i.e. points) the characteristic function 
Xj of this ball. This function takes values in F2 (i.e. is equal to one in the ball and to 
zero outside the ball, where one and zero are considered as elements of F2). Also we put in 
correspondence to a ball J the random variable 0j taking values in F2, where 0j is equal to 1 
with the probability pj which depends on the ball J, and this probability is a monotonously 
increasing function of a ball (i.e. for / D J one has pi > pjj^. Let also the random variables 
0j for the different J be independent. 

Let us consider for the point x G X the maximal increasing sequence {J} of balls which 
contain x (i.e. the minimal ball in this sequence is x and the maximal ball is X). The map 
Fi : X ^ ¥2 puts in correspondence to a point x & X the random element of F2 which is 
constructed as follows: 

F,{x) = J2^JXj{x). (2) 
J 

The summation runs over the sequence of balls { J}, x G J. 

The map F : X — )• F| is constructed as the sum of the two independent maps Fi acting 
at each of the coordinates in Fj. 

Let us discuss the example of the replica procedure for the introduced in the present 
section hierarchical hidden Markov model. The replication of the set X of hidden states of 
the model (transition from X to X) in this case can be considered as related to the map of 
taking mod2'^ residue (i.e. the projection) 

X ^ X : Z/2^^Z ^ Z/2^Z, M > N. 

Therefore the set X differs from the set X at small distances, i.e. points of X correspond 
to balls in X with the diameter 2~*'^. Each of these balls contain 2^^~^ points. As a set this 
space is in one to one correspondence with the direct product of X and the set consisting of 
2M-N gig]2ients (this allows to compare the above definition of X and the definition of the 
replica procedure for hidden Markov models in the previous section). 

The map / is constructed by extension of the transition probability g(-) of the Markov 
chain / to small distances, i.e. / corresponds to the transition probability q{-) where q{x) = 
q{x) for \x\2 > 2^^^ and for \x\2 < 2~*'^ the transition probability q{-) is defined in some 
arbitrary way (taking into account the conditions mentioned in the formula ([T])). 

Analogously, the map F is built from the map F by extension to smaller distances. We 
extend the set of independent random variables {4>j} by random variables corresponding to 
balls with the diameters satisfying \J\2 < 2^^^ (taking into account the mentioned above 
properties of the set {0j}). We define the map Fi by the formula (which now contains 
contributions from smaller balls), and define the map F as a pair of independent Fi. 

We have constructed the natural replica procedure for the described example of hierar- 
chical hidden Markov model. Let us note that the introduced in the present section replica 

^For example, one can put pj to be proportional to the Haar measure of the ball J. 
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procedure in general differs from tlie replica symmetric case considered in the previous sec- 
tion, since the maps / and F defined in this section are not necessarily coincide with the 
replica symmetric maps of the previous section. 
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